Create an interactive Parallel Plot¶

To demonstrate the use of the interactive parallel plot, we use a project already loaded into the CKG database.

[1]:

import pandas as pd

from ckg.report_manager import project, dataset, report
from ckg.analytics_core.viz import viz as plots

import networkx as nx
from networkx.readwrite import json_graph

from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go

from scipy.stats import zscore
init_notebook_mode(connected=True)
%matplotlib inline

import ipywidgets as widgets
from ipywidgets import interact, interact_manual

c:\users\sande\.conda\envs\pip_rev\lib\site-packages\outdated\utils.py:18: OutdatedPackageWarning:

The package pingouin is out of date. Your version is 0.3.11, the latest is 0.3.12.
Set the environment variable OUTDATED_IGNORE=1 to disable these warnings.

WGCNA functions will not work. Module Rpy2 not installed.
R functions will not work. Module Rpy2 not installed.

We create a new project object and load the respective data and report¶

[2]:

my_project = project.Project(identifier='P0000001', datasets={}, report={})
my_project.load_project_data()
my_project.load_project_report()

We can now access to all the results for each data type¶

[3]:

my_project.list_datasets()

[3]:

dict_keys(['clinical', 'multiomics', 'proteomics'])

We will use the results from the proteomics analyses. We access the dataset ‘proteomics’ for further analysis¶

[4]:

proteomics_dataset = my_project.get_dataset('proteomics')

The available analysis for this dataset are:¶

[5]:

my_project.get_dataset('proteomics').list_dataframes()

[5]:

['complex_associations',
 'correlation_correlation',
 'disease_associations',
 'drug_associations',
 'go annotation',
 'go_enrichment_Biological_processes_regulation_enrichment',
 'interaction_network',
 'literature_associations_publications_abstracts',
 'number of modified proteins',
 'number of peptides',
 'number of proteins',
 'original',
 'overview statistics_summary',
 'pathway annotation',
 'pathway_enrichment_Pathways_regulation_enrichment',
 'processed',
 'protein biomarkers',
 'regulated',
 'regulation table',
 'tissue qcmarkers']

We can access the different dataframes like this:¶

[6]:

my_project.get_dataset('proteomics').get_dataframe('go annotation')

[6]:

	annotation	group	identifier	source
0	mitochondrial genome maintenance	None	TYMP~P19971	UniProt
1	maltose metabolic process	None	MGAM~O43451	UniProt
2	maltose metabolic process	None	GAA~P10253	UniProt
3	ribosomal large subunit assembly	None	RPL11~P62913	UniProt
4	ribosomal large subunit assembly	None	RPL6~Q02878	UniProt
5	ribosomal large subunit assembly	None	RPL3~P39023	UniProt
6	ribosomal large subunit assembly	None	RPLP0~P05388	UniProt
7	ribosomal small subunit assembly	None	RPS28~P62857	UniProt
8	ribosomal small subunit assembly	None	RPS5~P46782	UniProt
9	ribosomal small subunit assembly	None	RPS14~P62263	UniProt
10	ribosomal small subunit assembly	None	RPS19~P39019	UniProt
11	ribosomal small subunit assembly	None	RPS27~P42677	UniProt
12	very long-chain fatty acid metabolic process	None	ACAA1~P09110	UniProt
13	autophagosome assembly	None	RAB1A~P62820	UniProt
14	autophagosome assembly	None	NSFL1C~Q9UNZ2	UniProt
15	autophagosome assembly	None	UBQLN1~Q9UMX0	UniProt
16	autophagosome assembly	None	RAB7A~P51149	UniProt
17	urea cycle	None	ASS1~P00966	UniProt
18	urea cycle	None	CPS1~P31327	UniProt
19	urea cycle	None	OTC~P00480	UniProt
20	urea cycle	None	ARG1~P05089	UniProt
21	urea cycle	None	ASL~P04424	UniProt
22	citrulline metabolic process	None	ASS1~P00966	UniProt
23	argininosuccinate metabolic process	None	ASS1~P00966	UniProt
24	ribosomal subunit export from nucleus	None	RAN~P62826	UniProt
25	ribosomal subunit export from nucleus	None	EIF6~P56537	UniProt
26	ribosomal large subunit export from nucleus	None	RAN~P62826	UniProt
27	ribosomal large subunit export from nucleus	None	NPM1~P06748	UniProt
28	ribosomal small subunit export from nucleus	None	NPM1~P06748	UniProt
29	ribosomal small subunit export from nucleus	None	RAN~P62826	UniProt
...	...	...	...	...
17753	negative regulation of extrinsic apoptotic sig...	None	SCG2~P13521	UniProt
17754	negative regulation of extrinsic apoptotic sig...	None	GSTP1~P09211	UniProt
17755	negative regulation of extrinsic apoptotic sig...	None	LMNA~P02545	UniProt
17756	negative regulation of extrinsic apoptotic sig...	background	THBS1~P07996	UniProt
17757	positive regulation of extrinsic apoptotic sig...	None	PTPRC~P08575	UniProt
17758	positive regulation of extrinsic apoptotic sig...	background	AGT~P01019	UniProt
17759	positive regulation of extrinsic apoptotic sig...	None	BID~P55957	UniProt
17760	positive regulation of extrinsic apoptotic sig...	background	PDIA3~P30101	UniProt
17761	positive regulation of extrinsic apoptotic sig...	None	PAK2~Q13177	UniProt
17762	positive regulation of extrinsic apoptotic sig...	None	PYCARD~Q9ULZ3	UniProt
17763	regulation of extrinsic apoptotic signaling pa...	None	FGFR1~P11362	UniProt
17764	negative regulation of extrinsic apoptotic sig...	background	PRDX2~P32119	UniProt
17765	negative regulation of extrinsic apoptotic sig...	None	COL2A1~P02458	UniProt
17766	positive regulation of extrinsic apoptotic sig...	None	PPP1CA~P62136	UniProt
17767	regulation of intrinsic apoptotic signaling pa...	None	PYCARD~Q9ULZ3	UniProt
17768	negative regulation of intrinsic apoptotic sig...	None	DDX3X~O00571	UniProt
17769	positive regulation of intrinsic apoptotic sig...	background	S100A8~P05109	UniProt
17770	positive regulation of intrinsic apoptotic sig...	None	BID~P55957	UniProt
17771	positive regulation of intrinsic apoptotic sig...	None	SLC9A3R1~O14745	UniProt
17772	positive regulation of intrinsic apoptotic sig...	background	S100A9~P06702	UniProt
17773	regulation of phosphatidylcholine biosynthetic...	None	FABP3~P05413	UniProt
17774	regulation of store-operated calcium entry	None	CD84~Q9UIB8	UniProt
17775	regulation of store-operated calcium entry	None	STC2~O76061	UniProt
17776	regulation of store-operated calcium entry	None	STIM1~Q13586	UniProt
17777	positive regulation of cation channel activity	None	CTSS~P25774	UniProt
17778	regulation of semaphorin-plexin signaling pathway	background	NCAM1~P13591	UniProt
17779	negative regulation of cysteine-type endopepti...	None	PARK7~Q99497	UniProt
17780	positive regulation of cysteine-type endopepti...	background	GSN~P06396	UniProt
17781	positive regulation of cysteine-type endopepti...	None	FAS~P25445	UniProt
17782	negative regulation of cysteine-type endopepti...	None	PAK2~Q13177	UniProt

17783 rows × 4 columns

In this case, we will use the the processed dataframe with transformed and imputed LFQ intensities. We then normalize the data using Z Score.¶

[7]:

proteomics_dataset = my_project.get_dataset('proteomics')
processed_df = proteomics_dataset.get_dataframe('processed')

[8]:

processed_df.head()

[8]:

	A2M~P01023	A30~A2MYE2	ABI3BP~Q7Z7G0	ACE~P12821	ACTB~P60709	ACTN1~P12814	ADA2~Q9NZK5	ADAMTS13~Q76LX8	ADAMTSL4~Q6UY14	ADH4~P08319	...	VIM~P08670	VK3~A2N2F4	VNN1~O95497	VTN~P04004	VWF~P04275	YWHAZ~P63104	group	sample	scFv~Q65ZC9	subject
0	38.005564	28.173504	21.588427	22.213865	27.090330	25.039968	23.442151	24.010605	25.085820	23.389032	...	24.178889	25.835908	22.480055	32.815815	28.922779	19.246215	Cirrhosis	AS1181	27.788928	S368
1	37.309118	27.981907	27.342062	23.847270	27.461155	25.896268	23.754503	24.135818	19.241174	22.148706	...	23.709777	25.004889	23.852908	32.722121	29.881279	22.141285	Cirrhosis	AS1182	26.869972	S369
2	37.384952	28.857627	20.156993	22.863630	27.929764	24.295225	23.359443	24.121788	24.923476	23.017163	...	23.599064	26.271650	24.232132	32.755752	29.444625	18.901149	Cirrhosis	AS1184	28.069328	S371
3	38.417225	28.978380	25.501910	22.992774	27.152479	25.231288	23.701340	24.568309	24.878802	26.388112	...	24.179076	25.929200	24.269047	32.714014	29.397176	22.216971	Cirrhosis	AS1185	28.170209	S372
4	37.471303	28.748744	20.658038	21.949025	27.537048	22.392992	22.406264	24.961173	22.246468	24.339540	...	23.865224	26.701340	20.490667	32.722691	28.540895	20.797497	Cirrhosis	AS1186	28.612280	S373

5 rows × 517 columns

[9]:

processed_df = processed_df.drop(['sample', 'subject'], axis=1).set_index('group').apply(zscore).reset_index()

In order to find clusters of proteins, we access the report and the protein-protein correlation network as a dictionary.¶

[10]:

proteomics_report = my_project.get_dataset('proteomics').report
proteomics_report.list_plots()

[10]:

dict_keys(['0_date', '0~proteomics_pipeline~cytoscape_network', '10~regulation_description~description', '11~regulation_anova~basicTable', '12~regulation_anova~volcanoplot', '13~correlation_correlation~network', '14~interaction_network~network', '15~complex_associations~basicTable', '16~drug_associations~basicTable', '17~disease_associations~basicTable', '18~literature_associations_publications_abstracts~basicTable', '19~literature_associations_publications_abstracts~wordcloud', '1~overview statistics_summary~multiTable', '20~go_enrichment_Biological_processes_regulation_enrichment~basicTable', '21~pathway_enrichment_Pathways_regulation_enrichment~basicTable', '2~proteins~barplot', '3~proteins~basicTable', '4~coefficient_variation_coefficient_of_variation~scatterplot_matrix', '5~quality_control_qcmarkers~qcmarkers_boxplot', '6~ranking_ranking_with_markers~ranking', '7~ranking_ranking_with_markers~basicTable', '8~stratification_description~description', '9~stratification_pca~pca'])

[14]:

correlation_net_dict = proteomics_report.get_plot('13~correlation_correlation~network')[0]

To convert the dictionary into a network, we access the json version within the dictionary and convert it using the networkX package.¶

[15]:

correlation_net = json_graph.node_link_graph(correlation_net_dict['net_json'])

Now that we have a network with proteins colored by cluster, we can convert this information into a dataframe to be used in this Jupyter Notebook.¶

[16]:

correlation_df = pd.DataFrame.from_dict(correlation_net.nodes(data=True))
correlation_df = correlation_df[0].to_frame().join(correlation_df[1].apply(pd.Series))

[17]:

correlation_df.columns = ['identifier', 'degree', 'radius', 'color', 'cluster']

Since the correlation network was generated using cut-off , not all the proteins in the processed dataframe are part of a cluster, therefore we filter the processed dataframe and keep only the proteins that are present in the correlation clusters.¶

[18]:

min_val = processed_df._get_numeric_data().min().min().round()
max_val = processed_df._get_numeric_data().max().max().round()
processed_df = processed_df[list(correlation_df.identifier) + ['group']]

Ready! To build the parallel plot, we create a dictionary with the clusters and respectives colors, and filter the processed dataframe to include only the proteins in a specific cluster.¶

Using the Jupyter Widgets interact function, we can make the plot interactive and allow the visualization of a cluster selected by the user.

[19]:

from IPython.core.display import display, HTML

[20]:

@interact
def plot_parallel_plot(cluster=correlation_df.cluster.unique()):
    cluster_colors = dict(zip(correlation_df.cluster, correlation_df.color))
    clusters = correlation_df.groupby('cluster')
    identifiers = clusters.get_group(cluster)['identifier'].tolist()
    title= "Parallel plot cluster: {}".format(cluster)
    df = processed_df.set_index('group')[identifiers].reset_index()
    figure = plots.get_parallel_plot(df, identifier=cluster, args={'color':cluster_colors[cluster],'group':'group',
                                                                          'title':title,
                                                                          'zscore':False})
    display(HTML("<p>{}</p>".format(",".join(identifiers))))
    iplot(figure.figure)

[ ]: